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Abstract 


The development of spectroscopic survey telescopes like Large Sky Area Multi-Object Fiber Spectroscopic 
Telescope (LAMOST), Apache Point Observatory Galactic Evolution Experiment and Sloan Digital Sky Survey 
has opened up unprecedented opportunities for stellar classification. Specific types of stars, such as early-type 
emission-line stars and those with stellar winds, can be distinguished by the profiles of their spectral lines. In this 
paper, we introduce a method based on derivative spectroscopy (DS) designed to detect signals within complex 
backgrounds and provide a preliminary estimation of curve profiles. This method exhibits a unique advantage in 
identifying weak signals and unusual spectral line profiles when compared to other popular line detection methods. 
We validated our approach using synthesis spectra, demonstrating that DS can detect emission signals three times 
fainter than Gaussian fitting. Furthermore, we applied our method to 579,680 co-added spectra from LAMOST 
Medium-Resolution Spectroscopic Survey, identifying 16,629 spectra with emission peaks around the Ha line 
from 10,963 stars. These spectra were classified into three distinct morphological groups, resulting in nine 
subclasses as follows. (1) Emission peak above the pseudo-continuum line (single peak, double peaks, emission 
peak situated within an absorption line, P Cygni profile, Inverse P Cygni profile); (2) Emission peak below the 
pseudo-continuum line (sharp emission peak, double absorption peaks, emission peak shifted to one side of the 
absorption line); (3) Emission peak between the pseudo-continuum line. 
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1. Introduction 


Selecting the desired stellar spectra from a massive data set is 
a key challenge. The continuum, spectral lines and profile 
features of these lines represent a portion of stellar character- 
istics. In this paper, our focus lies on spectral line shapes, 
which can be broadly classified into two groups: those devoid 
of emission lines and those exhibiting distinct emission-line 
features. The latter group encompasses various subclasses: 


1. Single-peak and Double-peak types, distinguished by the 
number and wavelengths of emission line peaks (e.g., 
Zhang et al. 2022). 

2. Emission within absorption lines, P Cygni and Inverse P 
Cygni profiles, which depend on the relative positions of 
emission and absorption lines (e.g., Snow et al. 1994). 

3. Emission blend or sharp emission line profiles, classified 
based on the number and characteristics of the emission 
features (Traven et al. 2015). 


Various types of Ha lines serve as valuable references for 
classifying stellar systems. For instance, double-peak emission 
lines can potentially arise from processes such as jet emission 


from the stellar polar regions, which may exhibit a significant 
inclination with respect to the observer’s line of sight. 
Additionally, the presence of accretion disks could also lead 
to the emergence of such spectral features (e.g., Bromley et al. 
1997). Double-peak absorption lines, on the other hand, may be 
indicative of a double-lined spectroscopic binary system (SB2), 
where the two absorption peaks could be attributed to a binary 
system. Stars with stellar winds or mass accretion can be 
identified using P Cygni or Inverse P Cygni profiles, allowing 
for the calculation of wind velocities based on the wavelength 
of the absorption peaks. 

General methods for classifying Ha lines include Gaussian 
fitting, machine learning and analyzing spectral line asymme- 
try, among others (e.g., Traven et al. 2015). 

The fitting technique, which includes methods like least 
squares, maximum likelihood (Rossi 2018) and affine invariant 
Markov Chain Monte Carlo (Foreman-Mackey et al. 2013), is a 
general approach for separating different components of a 
spectrum. This method works exceptionally well when we have 
a prior understanding of the spectral shape. Unfortunately, it 
encounters challenges when dealing with spectra that fall outside 
the adjustable parameter space of the prior predictive fitting 
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Figure 1. Derivative spectra of a Gaussian function. The original flux (OF) in the first panel represents a Gaussian function, and its first derivative (1st D), second 
derivative (2nd D) and third derivative (3rd D) are displayed in the three panels below. The three vertical black dashed lines correspond to the three zero-points of the 
third derivative, where the zero-point in the rising part corresponds to the peak of the original spectrum. 


method. This is a common issue when dealing with unusual 
spectra among a large sample. 

Machine learning techniques, including K-Nearest Neighbor 
(KNN) (Beyer et al. 1999), Random Forest (RF) (Brei- 
man 2001), AdaBoost (Hastie et al. 2009), Naive Bayes (Webb 
et al. 2010), logistic regression (Wright 1995), Support Vector 
Machine (Noble 2006) and Artificial Neural Network (Jain 
et al. 1996), have been utilized for the classification of spectra. 
These techniques rely on specific characteristics of spectral 
lines for classification, as described in Zhang et al. (2022). The 
process involves creating a training data set by selecting a 
subset of features extracted from spectra, such as equivalent 
width or full width at half maximum (FWHM). The trained 
model is subsequently used to classify spectra based on these 
extracted features. 

It is intuitive to speculate that the detection results of 
machine learning models depend on the training data set we 
use. This data set should ideally consist of samples that we 
already have or samples separated from spectroscopic survey 


telescope data. However, it is challenging to create a 
comprehensive training data set for rare spectral lines with 
complex structures, such as the P Cygni or Inverse P Cygni 
profiles. These lines are difficult for researchers to separate or 
identify from thousands of spectral data, making it challenging 
to build a complete training data set. 

In addressing the limitations of the above methods, we 
turned to the derivative technique, which was employed in 
analytical spectrophotometry by chemists during the 1980s to 
detect and pinpoint the wavelengths of complex spectrum 
signals that were challenging to resolve (O’ Haver et al. 1982). 
The general calculation of a derivative involves dividing the 
difference between the original spectrum f(A) and the same 
spectrum displaced by a finite wavelength f(\ + AA) by that 
finite wavelength AA, which is associated with the midpoint of 
the finite wavelength, resulting in FA + 7AA) and the higher 
derivatives are obtained by repeating this procedure the desired 
number of times (Butler 1979). In Figure 1, the original curve 
created by a Gaussian function provides three derivatives—the 
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Figure 2. The bottom image on the left illustrates a profile (solid line) formed by the combination of two Gaussian functions (dashed lines) with a 4 A separation. The 
fourth derivative (4th D) of this profile is displayed on the upper panel, with the blue curve representing the analytical spectrum. Additionally, there are orange, green, 


red and purple curves, which are obtained by computing the differences of the profile with intervals of 2, 4, 6 and 8 A, respectively. The bottom image on the right 
depicts the same scenario but with two Lorentzian functions separated by 2 A. In this case, there are four different spectral lines with intervals of 1, 2, 3 and 4 A. 


first derivative, the second derivative and the third derivative— 
beneath it through this process. The positive part of the third 
derivative exhibits a maximum value, two zeros and a full 
amplitude (zero—maximum-—zero), which correspond to the 
rising part of the original curve, while the negative part of the 
third derivative arises from the declining part of the curve, as 
expected. 

This paper introduces derivative spectroscopy (DS), a 
method aimed at extracting spectral line features and classify- 
ing profile types. In Section 2, we provide a detailed 
explanation of our methodology and assess its effectiveness 
using synthesized spectra. Moving on to Section 3, we apply 
DS to Large Sky Area Multi-Object Fiber Spectroscopic 
Telescope Medium-Resolution Spectroscopic Survey 
(LAMOST-MRS) data, leading to the identification of nine 
distinct subclasses. Subsequently, in Section 4, we engage in a 
comparative discussion involving DS, machine learning, and 
Gaussian fitting. Finally, our conclusions are presented in 
Section 5. 


2. Method and Derivative Spectroscopy 
2.1. Basic Theory 


DS offers a technique for enhancing the resolution of spectra 
and accurately isolating weak signals from background noise, 
making it possible to detect the maxima of spectral profiles 
more precisely (Stauffer & Sakai 1968). The improved spectral 
resolution is achieved because the width of peaks becomes 
narrower with higher-order derivatives (Fell 1983). In 


analytical scenarios, the FWHM of the fourth derivative 
spectrum, which is composed of a Lorentzian function, reduces 
to one-quarter of the FWHM of the original spectrum. This 
means that extreme values that might be obscured by 
overlapping components become more pronounced in higher- 
order derivative spectra (Butler 1979). 

In Figure 2, there are spectra composed of two Gaussian 
functions (left) and two Lorentzian functions (right), with their 
fourth derivatives displayed in the top two panels. It is evident 
that in both cases, the two components with maxima are 
separated, and the spectrum composed of Lorentzian functions, 
which have steeper slopes, is divided more significantly. 

It is reasonable to use DS for correcting spectral back- 
grounds. Every spectral profile can be viewed as a general 
function that can be expanded into a polynomial form 
(Pourahmadi 1984), as follows 


f” 0) on 


n! 


LOA) = fQ0) FANA = Ao) + + (1) 

The first term, f(ào), which contains information about the 
background of the spectrum, disappears in the first derivative, 
while the other terms containing information about the slope 
and structure of the spectrum are retained. 


2.2. Noise and Smoothness 


However, there is a significant disadvantage to the derivative 
technique, namely, that the signal-to-noise ratio (S/N) 
deteriorates as you move to progressively higher derivative 
orders (O’Haver & Begley 1981). This was illustrated by 
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O’Haver & Begley (1981) using an experimental spectrum 
composed of a series of amplitudes without a signal, taken at 
discrete and equally spaced wavelength increments (a1, a2, A3, 
...). The noise, which is assumed to be independent of the 
amplitude of the signal and follows a Gaussian amplitude 
probability distribution, can be expressed as the standard 
deviation of all the elements in the series. The standard 
deviation of the nth-order derivative can be calculated using the 
tules of error propagation 


n! 
On = aola) (2) 


In Equation (2), n represents the order of the derivative, and 
go is the standard deviation of the original (zeroth order) series. 
It is evident that the value of o,, increases with the order n. To 
address this issue and improve the S/N of derivative spectra, it 
is necessary to apply some form of low-pass filtering or 
smoothing during the differentiation process (O’Haver et al. 
1982). This smoothing is achieved through the convolution of 
the data series with a smoothing function composed of a set of 
weighting coefficients (Wand & Jones 1994). Examples of such 
smoothing filters include the average filter, median filter, 
Gaussian filter, Wiener filter, Savitzky—Golay filter and more 
(Gonzales & Wintz 1987; Schafer 2011). 


2.3. Method 


To address this limitation, we have considered two methods. 
The first method involves fitting subsets of the original series 
surrounding the data point x) with a polynomial through 
convolution (Press & Teukolsky 1990) 


y = ao + a(x — xo) + aa(x — xo)? + , (3) 


where dp represents the original (zeroth-order) series, a, 
corresponds to the first-order derivative series, a. corresponds 
to the second-order derivative series and so on. 

The second method involves convolving the signal with the 
derivative of a Gaussian kernel. This process results in the 
derivative of the original series and is based on the properties of 
convolution between two generalized functions (Bracewell & 
Kahn 1966) 

d f dg 

m eres aa Tx’ (4) 
where £ and “ represent the first derivatives of the generalized 
functions f and g, respectively. Convolution of the spectrum 
with the first derivatives of the Gaussian kernel is equivalent to 
convolving the first derivatives of the spectrum with the 
Gaussian kernel. 

We used the Savgol_filter routine from the signal sub- 
module and Gaussian_filterld from the scipy module (Jones 
et al. 2001) in Python. In our testing, Gaussian_filterld 
demonstrated greater stability at mid-to-low resolutions 
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compared to Savgol_filter. However, Savgol_filter exhibited 
superior sensitivity in high-resolution spectra. For consistency 
with the LAMOST-MRS spectra used in this study, we 
employed the Gaussian_filter]d method throughout this paper. 
It provided the first, second and third derivatives of a Gaussian 
curve with an S/N of 20, as shown in Figure 3. 

A zero-point in the descending part of the third derivative 
indicates the position of the minimum point in the second 
derivative, while a zero-point in the ascending part of the first 
derivative signifies the position of the maximum point in the 
original (zeroth order) series. Consequently, the zero-point in 
the descending part of the third derivative and the zero-point in 
the ascending part of the first derivative both serve as evidence 
for the presence of an emission peak in the original spectrum. 
Conversely, a zero-point in the ascending part of the third 
derivative and a zero-point in the descending part of the first 
derivative indicate the existence of an absorption peak in the 
original spectrum. 

For emission lines, the method is applied only in the region 
where the original spectrum is higher than 7), as represented by 
the solid black line in the OF part of the left picture in Figure 3. 
The selection part, represented by the solid red line, provides a 
declining curve in the first derivative through zero. Addition- 
ally, the selection part of the second derivative is lower than T>, 
which is also represented by the solid black line and provides a 
rising curve crossing through the zero-point in the third 
derivative. 

In the case of absorption lines, the selection parts are 
opposite to those for emission lines in both the original 
spectrum and the second derivative. In our tests, the absorption 
peak of the second derivative weakens as the FWHM of the 
original spectrum increases. This means it is challenging to 
identify the selection part in the second derivative of spectra 
with significantly broadened lines. Examples of our tests are 
shown in Figure 4, and we can observe that the disadvantage of 
the second derivative is not evident in the original spectrum. 
However, the third derivative is much more sensitive than the 
first derivative when dealing with the resolution overlap of two 
signals, as demonstrated in Figure 5. These characteristics of 
the second and third derivatives are a result of the primary 
benefits of DS highlighted earlier in this paper—background 
correction and improved spectral resolution. Hence, it is 
essential to utilize both the first and third derivatives for 
detecting peaks in the original spectrum. 

We employed three criteria to determine the presence of 
peaks in the original spectrum: 


1. The presence of a zero in the third derivative or first 
derivative. 

2. The pattern of negative and positive values around the 
zero-point: negative values to the left of zero and positive 
values to the right of zero indicate an emission peak in the 
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Figure 3. Original spectra and derivatives of Gaussian function with noise. The gray lines represent the derivatives obtained using a basic finite differences method, 
while the blue lines (in panels below the top one) represent the smoothed derivatives obtained using the Gaussian_filter1d. The black horizontal lines in the top panels 
indicate the threshold parameter applied to the original spectrum (Tọ), and in the middle to lower panels, they indicate the threshold parameter applied to the second 


derivative (T2). The red lines highlight the selected portions. 


original spectrum, while the opposite pattern suggests an 
absorption peak. 

3. For an emission line, the value of the original spectrum at 
the zero position is higher than in its vicinity, whereas for 
an absorption peak, the value is lower. 


Please note that Criterion 3, while ensuring the detection of 
emission or absorption peaks, may reduce the precision of our 
method. Users should apply it based on specific circumstances. 


2.4. Parameters of Method 


We have discussed three key parameters in our method: the 
width of the Gaussian kernel (op), the threshold for the original 
spectrum (7) and the threshold for the second derivative (7>). 
These parameters (op, Tı and T2) can affect the number of 
detected peaks and their associated wavelengths. Once 
determined for a specific instrumental setup, these parameters 
remain constant to maintain consistent detection efficiency 
across the entire spectrum sample. 


The 7, and T, parameters should be set to strike a balance 
between avoiding noise-induced distortions when set too low 
and ensuring the detection of genuine but weak signal peaks 
when set too high. Similarly, the sigma parameter must be 
chosen carefully. It should neither be excessively large to 
prevent excessive smoothing that could hinder the identifica- 
tion of closely spaced peaks, nor too small to minimize the 
impact of numerical noise resulting from 
derivatives. 

To determine the optimal parameters for our method using 
LAMOST-MRS data, we constructed a sample set of 
10,000,000 spectra with emission lines, mimicking the 
resolution of LAMOST-MRS. For each spectrum in this set, 
we varied the S/N and amplitude of the emission line, which 
was generated using a Gaussian function. The parameters of the 
Gaussian function used in the sample set are detailed in 


successive 


Table 1. In this table, log (*) represents the logarithm of the 


ratio between the amplitude (A) of the Gaussian function and 
the reciprocal of S/N (o,), which quantifies the level of noise. 
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Figure 4. Emission lines created by Gaussian function with FWHM—96.9, 193.9, and 318.5 km sl. 


Vo and o, denote the center and standard deviation of the 
Gaussian function, respectively. 
We uniformly selected 1000 points at log(=) and generated 


1000 samples at each point, with each sample containing 
random noise following a normal distribution. The standard 
deviation of this noise was o,, which is the reciprocal of S/N, 
for each of the selected points. This approach allows us to 
calculate the detection efficiency of our parameter settings for 
each amplitude, and importantly, this efficiency should remain 
independent of the S/N. 

The T; parameter, which is applied to the original spectrum 
and is independent of the other two parameters, was determined 
first. It was set to three times the standard deviation of the 
noise-free, smooth portion of the spectrum. This threshold 
eliminates 99.7% of spectra without a signal and detects signals 
as faint as 0.6 times the value of the reciprocal of S/N (o,), 
achieving a 100% detection efficiency when the signal reaches 
3 times o,. Setting Tı lower would not increase detection 


sensitivity but would introduce many spectra without a signal 
into our results. 

To refine our parameter selection, we drew inspiration from 
the methodology proposed by Merle et al. (2017). They 
assessed the precision of radial velocities computed via the 
Cross-Correlation Function, comparing them with the original 
radial velocities obtained from the Difference Of Expected 
method under various op parameter configurations. Our own 
experiments have affirmed that, within the current data set, the 
Op parameter, spanning from 0.5 to 2.5 A, proves to be robust 
for our methodology. 

The parameters op and T; are interconnected and collectively 
impact the selection criteria for the second derivative. To 
determine the o, parameter, we employed a marginalization 
approach, ultimately setting it at 0.61 A. Subsequently, we 
determined the T, parameter to be 3.5 times the standard 
deviation. With these settings, our method can detect signals 
with an amplitude as low as 1 times the standard deviation in 
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Figure 5. Double-peak spectra created by Gaussian function with internal—193, 129, and 95 km s~". 


Table 1 Table 2 
Parameter Space of the Data Set Setups used in LAMOST and the Associated Estimated Parameters 
S/N [10, 100](dex = 10) Para LAMOST-MRS LAMOST-LRS 
A = z z 
toe (*) i 1 Op 0.61-0.76 A 0.6A 
Vo 0 Tı 3-3.3 3 
Og 1 Tə 3.5-4.5 3AA 


the second derivative for signals with a half-width of 2.1 A. 
Similarly, for signals with a half-width of 6.3 A, these 
parameters can detect signals with an amplitude as low as 
3 times the standard deviation while eliminating spectra with- 
out signals in nearly 99.85% of cases. Lower T> settings would 
fail to detect lower-amplitude signals while significantly 
increasing the number of detected spectra without signals. 

In our testing, all three parameters for LAMOST-MRS 
spectra within the specified range in Table 2 have proven to be 
reliable. However, it is important to note that the op and Ty 


parameters are interrelated. Adjusting one may require fine- 
tuning the other. 

Under the current parameter settings, we have observed that 
when the FWHM is 1.22 A, the detection efficiencies of the 
first derivative and third derivative are nearly identical. The 
subtle difference between them lies in the fact that the third 
derivative is better at detecting faint signals below 2.5 times the 
standard deviation (¢,), while the first derivative excels at 
detecting faint signals above 2.5 times o,. When the FWHM is 
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Figure 6. Detection efficiency of the first derivative and third derivative at FWHM 0.61, 1.22 and 1.83 A. 


below 1.22 A, the first derivative’s detection efficiency is lower 
than that of the third derivative. Conversely, when the FWHM 
is above 1.22 A, the first derivative’s detection efficiency 
surpasses that of the third derivative. You can see the detection 
efficiencies of the first and third derivatives at FWHM values of 
0.61, 1.22 and 1.83 A in Figure 6. The detailed reasons for 
these differing detection efficiencies are explained in 
Section 2.3. 


2.5. Test of Synthesis Spectra 


The sample set of artificial spectra was introduced in the 
previous section. The detection efficiency of the DS method for 


each amplitude is depicted as a black solid line in Figure 8. 
Across the four panels with S/N values ranging from 20(a) to 
80(d), the detection efficiency consistently starts to rise at 
0.5 times the value of o,, which is 15.5% and reaches 97.9% at 
3.5 times øs. We also replicated the Gaussian fitting detection 
method and conducted comparative tests using the same 
data set. 

Gaussian fit is a robust method for detecting the different 
components of a complicated spectrum structure by fitting the 
profile of a spectral line with Gaussian functions. In Figure 7, we 
used two Gaussian functions to fit three Ha lines from LAMOST- 
MRS spectra. Each line is divided into two components: the green 
dotted line represents the emission line, and the red dotted line 
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Figure 7. Spectra from the LAMOST-MRS data set were selected as examples, and Gaussian functions were applied to fit the lines of these spectra in terms of 


normalized flux (NF). 


represents the absorption line. The blue solid line is the fitted result, 
while the black solid line represents the original line. The fitting 
residuals for each spectrum are all lower than 0.1. 

The detection efficiency of the Gaussian fit for each 
amplitude is also shown in Figure 8. Similar to DS, it reaches 
99.9% at 3.5 times the o, but has difficulty detecting signals 
lower than 2.7 times, which is 17%. 


3. Test at LAMOST-MRS 
3.1. Data Selection 


The LAMOST telescope, also known as the Guo Shou Jing 
Telescope, is a special reflecting Schmidt telescope (Cui et al. 
2012; Zhao et al. 2012). The LAMOST-MRS offers spectra in 
the wavelength ranges covered by the blue and red arms, 
which are 4950-5350A and 6300-6800 A, respectively 
(Hou et al. 2018). To detect emission lines and their 
associated wavelengths, we have specifically chosen the Ha 
profile located in the red arm, as it exhibits the most 
significant profile characteristics. 

It is important to note that spectra with low S/N can make it 
challenging to reliably detect peaks. Consequently, we have 
selected only the Ha lines in the red arm with an S/N greater 
than 10 to form our sample set. To ensure consistency and 
facilitate further analysis, we have applied a normalization 


procedure to all samples using the sub-module normal- 
ization_spectrum_spline of the laspec module. This normal- 
ization process is a standard procedure for spectra from 
LAMOST-MRS data (e.g., Zhang et al. 2021). The normal- 
ization process involves the following steps: 


1. Each spectrum is divided evenly into 10 bins based on 
wavelength. 

2. Quadratic spline interpolation is applied to these 10 bins 
using the median flux values. 


This procedure helps to standardize and prepare the spectra 
for further analysis, making them more amenable to peak 
detection and other analytical techniques. 

Figure 9 displays a normalized red-arm spectrum from 
LAMOST-MRS on the left, and on the right, it shows the radial 
velocity profile of the Ha line. The wavelength range in the 
right panel is the specific region of interest. 


3.2. Results 


We conducted our method’s testing on a randomly selected 
subset of 579,680 coadded spectra from LAMOST. These 
spectra originated from 249,324 stars cross-matched with Gaia 
data. DS identified 23,804 spectra with either emission lines or 
double absorption peaks. We excluded 7031 spectra that 
exhibited very strong negative values or zero flux values 
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Figure 9. Examples of the normalized spectrum of LAMOST-MRS and its profile of the Ha line. 
; 3.2.1. Morphological Classification 
around Ha in the spectra. Additionally, we excluded 


144 spectra, which accounted for less than 0.86% of the total, 
that initially appeared to have absorption profiles but were 
misidentified as emissions during the first step of data 
reduction, likely due to noise or cosmic rays. 

In presenting our results, we often use a velocity scale for the 
wavelength values, with zero centered at the Ha line. 


The DS method described in the preceding section is 
primarily a morphological analysis of spectra. It does not 
directly pertain to the underlying physics of the observed 
object. Instead, it extracts information about the wavelength 
and amplitude of emission and absorption lines from the 
derivative analysis. 
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As aresult, we can use a limited set of parameters to classify 
all 10,963 analyzed objects, along with their 16,629 spectra, 
into meaningful morphological categories, and potentially even 
into physical categories. 

In our classification, we divided the 16,629 spectra into three 
main classes: (1) Emission peak above the pseudo-continuum 
line—34.6%; (2) Emission peak below pseudo-continuum line 
—64.9%; (3) Emission peak between pseudo-continuum line 
—1.5%: 


1. Emission Peak Above Pseudo-continuum Line (Class I): 
This class comprises four sub-types (Type 1.1, Type 1.2, 
Type 1.3, Type 1.4 and Type 1.5). The presence of an 
emission peak above the continuum strongly suggests the 
possibility of the star being an emission-line star. 

(a) Single-peak (Type 1.1):—Profiles of this type are 
characterized by a prominent emission peak that 
stands out above the continuum. This distinct feature 
makes them easily identifiable in our analysis. 

(b) Double-peak (Type 1.2):—These profiles also display 
explicit emission characteristics. The key difference 
from Type 1.1 is that Type 1.2 has two emission peaks 
around the H a line. Most of them in our sample exhibit 
wider winds compared to other spectra in Class I. 

(c) Emission-peak within an Absorption Line (Type 1.3): 
—Unlike the previous two types, this profile type 
features a wide, shallow absorption component over- 
laying the emission line. 

(d) P Cygni Profile (Type 1.4):—This profile type 
exhibits an absorption component to the left of the 
Ha emission line, indicating an expanding envelope 
surrounding the central star. 

(e) Inverse P Cygni Profile (Type 1.5):—Unlike Type 1.3, 
the absorption component in this type shifts to the 
right of the Ha emission line. This redshifted 
absorption component indicates a contracting envel- 
ope surrounding the star. 

. Emission-peak below the pseudo-continuum line (Class 
If): This class encompasses three sub-types (Type 2.1, 
Type 2.2 and Type 2.3). This classification suggests that 
the star may exhibit an emission-line profile with either 
deep absorption or multiple absorption peaks, making it 
challenging to discern through morphological analysis. 
(a) Type 2.1: Sharp emission-peak profiles in this 

category feature an emission peak identified by DS, 
situated at the center of a deep absorption component. 
The peak of the emission line remains below the 
pseudo-continuum line. 

(b) Type 2.2: This type is similar to Type 2.1, with the 
distinction that the emission peak is scarcely dis- 
cernible by the third derivative. Instead, two clearly 
defined absorption peaks are evident around the 
Ha line. 
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(c) Type 2.3: Profiles in this category also possess an 
emission peak detectable by DS. However, in Type 
2.3, the emission peaks are shifted to one side of the 
absorption, resulting in the location of an absorption 
peak on only one side of the emission peak. 

3. Emission peak between continuum (Class III): Unlike the 
other two classes, this class exhibits an emission peak 
positioned between the continuum levels. Here, the 
emission peak is higher on the red side of the Ha line 
and lower on the blue side, resulting in a noticeable jump 
around the Ha line. 


For each of the nine subclasses, we plotted the normalized 
flux spectra in the Ha band, along with their first, second, and 
third derivative spectra in Figures 10-12. The spectral 
classification is based on the morphological features of the 
Ha line profiles, as mentioned earlier. It is important to note 
that this classification may not necessarily be related to the 
underlying physics of the stars. Nevertheless, we still hope to 
impose certain constraints on the corresponding stars based on 
the classification of the spectral line profiles. 

For example, type 1.2 may stem from phenomena like stellar 
jets and accretion disks, whereas types 1.4 and 1.5 are 
indicative of the stellar envelope undergoing outward and 
inward motion, respectively. Class II may arise from emission 
lines within absorption lines or SB2 binary stars. The jumps in 
class III spectra may be attributed to molecular absorption 
bands in cooler stars. We provide the red-arm normalized flux 
spectra for this type in Figure 12(a), which offers a more 
intuitive view. 


3.2.2. Classification Criteria 


We established nine sub-classes along with the conditions 
for their assignment, as detailed in Table 3. This classification 
system utilizes eight parameters (Ng, Na, NFg, RVg, NFa, 
RVa, NF,, NF»), which are explained below. Their values are 
directly derived from our method: 


1. Ng and Na represent the number of emission and 
absorption peaks respectively around Ha in the spectrum. 
These peaks are identified by locating the zero-crossings 
in the rising and declining parts of the first and third 
derivatives of the original spectrum. 

2. NFg and RV; stand for the amplitude and radial velocity 
of emission peaks respectively. In cases where Ng is 
greater than 1, these parameters are denoted as NFg, 
NFgo, NFgez +, as well as RVg;, RV, RV¢E3°°-, 
following the sequence from red to blue. The amplitude 
and radial velocity of emission peaks are associated with 
their normalized flux and wavelength. The normalized 
flux can be easily obtained by locating the zero-crossings 
in the rising part of the third derivative spectrum and 
applying them to the original spectrum. 
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Figure 10. Ha emission profiles of Class I and its first, second, and third derivative. The red and green solid lines respectively mark the emission peaks and absorption 
peaks of the original spectrum. Please note that the peak markers are derived from the selection of first and third derivatives by T,, T>, and they do not align perfectly 
with the zero-crossings of the unfiltered derivative spectrum. 
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Figure 11. Same as Figure 10 but for Class II. 


3. NFa and RV, represent the amplitude and radial velocity 
of absorption peaks respectively. Similarly, these para- 
meters are labeled as NFa;, NFa2, NFa3--: and RVaj, 
RVa2, RVa3°::, derived from the zero-crossings in the 
declining part of the third derivative spectrum. 

4. NF, and NF, are the median values of the red and blue 
bands of the Ha line respectively, each with three times 
the standard deviation. 
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4. Discussion 


DS is one of the possibilities for morphological analysis of 
weak signals of profiles of spectral lines, which provides a 
technology to enhance the resolution of the spectrum and 
separate a weak signal from useless background. The method 
we introduced here was used to first detect profiles of the Ha 
line and then classify them based on a few parameters of DS, 
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Figure 12. Example (b) of Class IMI in Figure 10 and its normalized spectrum (a) of red arm. 


although this method could be used for other spectral lines in a 
similar way. We compared DS with a Gaussian fit of artificial 
spectra, which indicates DS is more sensitive at a weak signal 
and enables us to detect the signal with an amplitude that is 
lower than 3 times o,. 

This advantage also applies to distinguishing spectra in 
LAMOST-MRS. We compared the two methods in spectra 
with confirmed emission lines, and the results are illustrated in 
Figures 13 and 14. It is obvious that the DS method could 
detect each peak of six spectra in either Figures 13 or 14. As 
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seen in Figure 13, two Gaussian functions can be easily fitted to 
the emission and absorption peaks of the spectrum. For the 
spectrum of J043114.444+271017.9, which has only one 
emission peak, both Gaussian functions are fitted to the 
emission peak, resulting in a fit that closely matches the actual 
spectrum. However, in Figure 14, the Gaussian fitting method 
is more likely to fit two Gaussian functions to the absorption 
lines with larger amplitudes and broader widths, while 
overlooking weak and low-resolution emission peak signals. 
As for the spectrum of JO43114.44+271017.9, which has two 
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Table 3 
Scheme of Morphological Classification of Nine Sub-classes 


Class Type Condition I Condition II Condition MI Condition IV 
Class I 1.1 NFg > NF; b Ng Na =0 

1.2 NF; > NF,» Ne= Na =0 

1.3 NFg > NF,» Ng=1 Na = RVai < RVg < RVa2 

14 NEF, > NF,» ee Ny = 183 km s™! > RVg — RV, > 0 

1.5 NFg > NF,» Ng = Na=1 0>RV_g—RV, > — 183 kms! 
Class T 2.1 NFa r < NF,» Ne= Na = RV, < RVg < RV a2 

2.2 NFar < NF,» Ng = Na = 

23 NF e < NF,» N= Na = 183 kms~! > |RV4 — RVz| > 0 
Class M 3 NF, > NFg > NF, Ng Na = RV, — RVE > 0 


emission peaks and one absorption peak, or even more 
complex spectral structures, it becomes challenging for two 
Gaussian functions to perfectly represent its structural features. 
Additional Gaussian functions are needed to match different 
components. This not only increases the difficulty of fitting but 
also makes it prone to overfitting. In large-scale spectroscopic 
data from sky survey telescopes, it is challenging to make an 
initial judgment on the profile of each spectrum and use a 
different number of Gaussian functions for fitting based on its 
characteristics. 

We compared the DS method, using the parameters outlined 
in Table 2, with machine learning on both LAMOST-Low- 
Resolution Spectroscopic Survey (LRS) and LAMOST-MRS 
spectra. In the catalog of Zhang et al. (2022), they listed 30,048 
spectra linked to 25,886 stars using machine learning. In the 
comparison, we identified 30,098 spectra from 24,199 stars as 
emission lines, and all of these stars align with the findings 
from Zhang et al. (2022). The discrepancy in the number of 
spectra might arise from repeated observations of certain stars. 
Despite minor differences between the methods, our approach 
demonstrates higher efficiency as it does not necessitate manual 
re-inspection of spectra. Additionally, the DS method shows 
greater stability in detecting repeated observations of individual 
stars. 

We also attempted to replicate the machine learning 
method by calculating equivalent widths or FWHM to 
distinguish emission lines in LAMOST-MRS spectra. We 
divided 4375 samples into training and testing sets. Then, we 
tested two machine learning methods, that performed the 
best in low-resolution spectra, KNN and RF (Zhang et al. 
2022). The accuracies in the testing sets of KNN and RF are 
0.878 and 0.907 respectively, which are much lower than 
0.997 and 0.989, obtained by Zhang et al. (2022) with 
LAMOST-LRS data. This may be due to the selection effect 
and the higher resolution of medium-resolution spectra. In 
Zhang et al. (2022), emission lines were only detected in O, 
B and A type, which may to some extent enhance the 
accuracy. In comparison to LAMOST-LRS, LAMOST-MRS 
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exhibits more morphological features in the Ha profile. This 
makes it difficult for features like equivalent width to 
describe the overall shape of the profile, resulting in reduced 
accuracy. 

We randomly selected 1000 spectra from the catalog 
provided by DF, which potentially contained emission lines, 
and another 1000 spectra detected by DS as lacking emission 
lines. We used these samples to compare the two methods. 
Machine learning authenticated 1048 spectra as having 
emission lines. Among them, 157 spectra were authenticated 
by machine learning as having emission lines, but DF identified 
them as lacking emission lines. Additionally, 109 spectra were 
identified by DF as having emission lines, but machine learning 
classified them as lacking emission lines. We selected a few 
spectra from each category and plotted them in Figure 15. The 
Ha profiles, identified as emission lines by machine learning 
but missed by DS, are shown in Figure 15(a), which resemble 
single absorption lines rather than emission lines. The other 
situation is displayed in Figure 15(b), which has at least one 
emission peak. 

Machine learning outperforms in tasks related to data mining 
and binary classification. Due to this, DS did not exhibit a 
significant edge in detecting faint emission lines. This might be 
attributed to the relative immaturity of the classification criteria 
we presently employ. Nevertheless, relying solely on para- 
meters like equivalent width and FWHM for emission line 
detection, the machine learning approach evidently falls short 
in estimating the spectral profile and carrying out morpholo- 
gical classification as proficiently as DS and Gaussian fitting. In 
instances where the spectral profile is intricate, it encounters a 
challenge akin to Gaussian fitting. This challenge arises from 
the fact that depending solely on a limited set of characteristic 
parameters, such as equivalent width and FWHM, makes it 
arduous to characterize the structural attributes of the spectral 
line comprehensively. Ultimately, this leads to a certain degree 
of misjudgment. Herein lies the strength of DS. For spectra 
with profiles of any shape, we can readily differentiate them 
and identify the parameters linked to their peaks across various 
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Figure 13. Gaussian fit and DS of J084928.91+103601.5, J043114.44+271017.9 and J052847.59+342423.3. 
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Figure 15. (a) Examples identified as emission lines by machine learning but missed by DS. (b) Examples identified as emission lines by DS but missed by machine 
learning. 
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orders of derivatives. This empowers us to evaluate their 
spectral line structure. 


5. Conclusion 


DS offers a potential avenue for the morphological analysis 
of faint signal profiles in spectra. During the testing in 
LAMOST-MRS, from the 579,680 coadded spectra in 
LAMOST-MRS, we found 16,629 spectra with some kind of 
feature that might be related to the underlying physics of the 
observed object. The wavelength and amplitude of peaks 
obtained from the first and third derivatives of the spectrum 
enable us to construct a simplified morphological classification. 
We use the classification scheme described in Table 3 to 
classify all spectra into nine sub-classes. 

In comparison to Gaussian fitting, this method exhibits 
higher sensitivity in detecting faint signals. It enables precise 
localization of peaks even with small amplitudes and low 
resolutions, offering accurate wavelength positions and ampli- 
tudes. Unlike machine learning, this method provides a more 
intricate and precise detection process. It can offer an initial 
assessment of the spectral profile based on the identified peak 
wavelengths and amplitudes. In fact, DS can furnish seven 
parameters pertaining to the spectrum, utilizing them to 
establish a preliminary estimation of the spectral profile. 
Whether employing this estimation as a prior for spectral fitting 
or utilizing the current parameters as features and subsequently 
applying machine learning to refine our existing classification 
criteria, both approaches are viable. However, this subsequent 
work is not within the scope of this paper but may be pursued 
in future studies. 
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